Skip to content

Conversation

@yanxi0830
Copy link
Contributor

@yanxi0830 yanxi0830 commented Mar 5, 2025

What does this PR do?

Test Plan

dataset

LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/integration/datasetio/
image

scoring

LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring --text-model meta-llama/Llama-3.1-8B-Instruct --judge-model meta-llama/Llama-3.1-8B-Instruct
image

eval

LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/eval --text-model meta-llama/Llama-3.1-8B-Instruct --judge-model meta-llama/Llama-3.1-8B-Instruct
image

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 5, 2025
@yanxi0830 yanxi0830 changed the title tests (wip): revamp eval related integration tests test(wip): revamp eval related integration tests Mar 5, 2025
"role": "user",
"content": judge_input_msg,
}
UserMessage(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug uncovered from unit tests :)

@yanxi0830 yanxi0830 added this to the v0.1.6 milestone Mar 6, 2025
@yanxi0830 yanxi0830 changed the title test(wip): revamp eval related integration tests test: revamp eval related integration tests Mar 6, 2025
@yanxi0830 yanxi0830 marked this pull request as ready for review March 6, 2025 01:35
prompt_template=sample_judge_prompt_template,
judge_score_regexes=[r"Score: (\d+)"],

scoring_fn = scoring_fns_list[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the first one?

Copy link
Contributor Author

@yanxi0830 yanxi0830 Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We test 1 scoring function per provider, as braintrust has 10+ scoring functions (each having multiple LLM calls), and its slow to loop over all.

Will look into having mocks for scoring as well s.t. running can be within reasonable time.

@yanxi0830 yanxi0830 merged commit bcb13c4 into main Mar 6, 2025
4 checks passed
@yanxi0830 yanxi0830 deleted the revive_eval_integration_test branch March 6, 2025 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate providers/tests into tests/api for evals, datasets, scorings API

4 participants